This project is a proof of concept for the Atlanta Regional Commission that demonstrates the capability of Convolutional Neural Networks (CNNs) to accurately classify aerial imagery of the Atlanta area into a specific land cover class. This particular segment of the project shows a baseline level of accuracy to be expected with five different image classifications: High density residential, medium density residential, low density residential, forest, and commercial. Additional model tuning and imagery analysis should be able to further increase overall accuracy.
The imagery for this part of the project is from 2010. The idea is to generate a robust CNN (Convolutional Neural Network) based on that imagery. When new imagery becomes available, the model can be tweaked and improved with higher quality imaging technology.
require(tidyverse)
require(keras)
require(caret)
require(pROC)
require(imager)
require(sqldf)
require(randomForest)
The neural network structure requires images to be separated into folders based on their classification.
set.seed(314159)
filepath <- "F:\\LandPro_2010_Imagery\\fulton_2010_jpg"
files <- list.files(path = filepath)
cat111images <- files[grep(".*_111.jpg", files)]
cat112images <- files[grep(".*_112.jpg", files)]
cat113images <- files[grep(".*_113.jpg", files)]
cat12images <- files[grep(".*_12.jpg", files)]
cat40images <- files[grep(".*_40.jpg", files)]
dir.create(paste0(filepath, "\\train", sep = ""))
dir.create(paste0(filepath, "\\validation", sep = ""))
dir.create(paste0(filepath, "\\test", sep = ""))
dir.create(paste0(filepath, "\\train\\111", sep = ""))
dir.create(paste0(filepath, "\\validation\\111", sep = ""))
dir.create(paste0(filepath, "\\test\\111", sep = ""))
dir.create(paste0(filepath, "\\train\\112", sep = ""))
dir.create(paste0(filepath, "\\validation\\112", sep = ""))
dir.create(paste0(filepath, "\\test\\112", sep = ""))
dir.create(paste0(filepath, "\\train\\113", sep = ""))
dir.create(paste0(filepath, "\\validation\\113", sep = ""))
dir.create(paste0(filepath, "\\test\\113", sep = ""))
dir.create(paste0(filepath, "\\train\\12", sep = ""))
dir.create(paste0(filepath, "\\validation\\12", sep = ""))
dir.create(paste0(filepath, "\\test\\12", sep = ""))
dir.create(paste0(filepath, "\\train\\40", sep = ""))
dir.create(paste0(filepath, "\\validation\\40", sep = ""))
dir.create(paste0(filepath, "\\test\\40", sep = ""))
In any sort of statistical application such as machine learning, a key aspect of determining a model’s efficacy is to see how it performs on data/input that it hasn’t seen before. Any model can be fine tuned to the point where it can spit out results to data that it has already learned from, but the best models are those that can take new data and effectively apply nuanced generalizations that it has learned from previous data. Splitting the images up into 3 different sets allows us to determine how well the CNN generated below can do this. The images in the Training folders are those that the model will directly learn from. The Validation folder images are part of an intermediate step where we can see how well the network is learning. If its accuracy on the Validation images does not show improvement as the learning process progresses, then we know there’s something wrong with the network structure and/or parameters that needs to be fixed.
Once the neural network model has been created, we can then go to the Test images and see whether the network has been able to “teach itself” how to generalize well enough to pick up on identification patterns in images it hasn’t seen before.
val111 <- sample(1:length(cat111images), round(0.3*length(cat111images)))
test111 <- setdiff(sample(1:length(cat111images), round(0.3*length(cat111images))), val111)
train111 <- setdiff(1:length(cat111images), union(val111, test111))
val112 <- sample(1:length(cat112images), round(0.3*length(cat112images)))
test112 <- setdiff(sample(1:length(cat112images), round(0.3*length(cat112images))), val112)
train112 <- setdiff(1:length(cat112images), union(val112, test112))
val113 <- sample(1:length(cat113images), round(0.3*length(cat113images)))
test113 <- setdiff(sample(1:length(cat113images), round(0.3*length(cat113images))), val113)
train113 <- setdiff(1:length(cat113images), union(val113, test113))
val12 <- sample(1:length(cat12images), round(0.3*length(cat12images)))
test12 <- setdiff(sample(1:length(cat12images), round(0.3*length(cat12images))), val12)
train12 <- setdiff(1:length(cat12images), union(val12, test12))
val40 <- sample(1:length(cat40images), round(0.3*length(cat40images)))
test40 <- setdiff(sample(1:length(cat40images), round(0.3*length(cat40images))), val40)
train40 <- setdiff(1:length(cat40images), union(val40, test40))
file.copy(file.path(filepath, cat111images[test111]), file.path(paste0(filepath, "\\test\\111\\", sep = "")))
file.copy(file.path(filepath, cat111images[val111]), file.path(paste0(filepath, "\\validation\\111\\", sep = "")))
file.copy(file.path(filepath, cat111images[train111]), file.path(paste0(filepath, "\\train\\111\\", sep = "")))
file.copy(file.path(filepath, cat112images[test112]), file.path(paste0(filepath, "\\test\\112\\", sep = "")))
file.copy(file.path(filepath, cat112images[val112]), file.path(paste0(filepath, "\\validation\\112\\", sep = "")))
file.copy(file.path(filepath, cat112images[train112]), file.path(paste0(filepath, "\\train\\112\\", sep = "")))
file.copy(file.path(filepath, cat113images[test113]), file.path(paste0(filepath, "\\test\\113\\", sep = "")))
file.copy(file.path(filepath, cat113images[val113]), file.path(paste0(filepath, "\\validation\\113\\", sep = "")))
file.copy(file.path(filepath, cat113images[train113]), file.path(paste0(filepath, "\\train\\113\\", sep = "")))
file.copy(file.path(filepath, cat12images[test12]), file.path(paste0(filepath, "\\test\\12\\", sep = "")))
file.copy(file.path(filepath, cat12images[val12]), file.path(paste0(filepath, "\\validation\\12\\", sep = "")))
file.copy(file.path(filepath, cat12images[train12]), file.path(paste0(filepath, "\\train\\12\\", sep = "")))
file.copy(file.path(filepath, cat40images[test40]), file.path(paste0(filepath, "\\test\\40\\", sep = "")))
file.copy(file.path(filepath, cat40images[val40]), file.path(paste0(filepath, "\\validation\\40\\", sep = "")))
file.copy(file.path(filepath, cat40images[train40]), file.path(paste0(filepath, "\\train\\40\\", sep = "")))
Designating file paths for Training and Validation images for use in the creation of the neural network below.
train_dir <- file.path(paste(filepath, "\\train", sep = ""))
validation_dir <- file.path(paste(filepath, "\\validation", sep = ""))
CNNs work by taking a small sample of images (referred to as a batch) and learning patterns from those sample images. It then moves on to the next batch of images and learns patterns from that sample, and so on until it completes a preset number of learning cycles. The batch size will be defined below, but at this point it is useful to determine the total number of images available. That helps to determine the batch size and the number of cycles needed to accurately process the images.
numtrainingfiles <- length(list.files(paste(filepath, "\\train\\111", sep = ""))) +
length(list.files(paste(filepath, "\\train\\112", sep = ""))) +
length(list.files(paste(filepath, "\\train\\113", sep = ""))) +
length(list.files(paste(filepath, "\\train\\12", sep = ""))) +
length(list.files(paste(filepath, "\\train\\40", sep = "")))
CNNs take vectorized image pixel values (3 in this case, that correspond to 3 RGB values) and perform basic arithmetic operations on them to generate a “stack” of matrices that contain average pixel values for small pieces of each image at a time. It then combines these stacks into new smaller layers, and repeats the process until it gets to a point where there is only a 1-dimensional array of sums and averages that will then get filtered down into a series of probabilities that correspond to the model’s interpretation of what category each image belongs to.
model <- keras_model_sequential() %>%
layer_conv_2d(filters = 32, kernel_size = c(3, 3), activation = "relu",
input_shape = c(192, 192, 3)) %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_conv_2d(filters = 64, kernel_size = c(3, 3), activation = "relu") %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_conv_2d(filters = 64, kernel_size = c(3, 3), activation = "relu") %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_conv_2d(filters = 128, kernel_size = c(3, 3), activation = "relu") %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_conv_2d(filters = 128, kernel_size = c(3, 3), activation = "relu") %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_flatten() %>%
layer_dropout(rate = 0.5) %>%
layer_dense(units = 1024, activation = "relu") %>%
layer_dense(units = 5, activation = "softmax")
Setting the loss metric, optimizer and learning rate of the CNN.
Ultimately, we want the model’s effectiveness to be determined by how accurate it is. While this might seem obvious, this is something that needs to be specified for the network to work with. As noted above, the network will learn what it can from small image samples and then move on to the next group. However, with data as complex as imagery, we don’t want the model to spend too much or too little time learning on each group of images. If it tries to learn too slowly, then the amount of processing time it takes to go over thousands of pixels in thousands of images becomes prohibitively large. If it tries to learn too quickly, then it likely will not pick up on all of the nuances that help distinguish one type of image from another and accuracy will suffer. We have to manually set the learning rate so that processing time and accuracy are balanced.
model %>% compile(
loss = "categorical_crossentropy",
optimizer = optimizer_rmsprop(lr = 0.0001),
metrics = c("acc")
)
In order to capture patterns in images that may not be normalized, the image generator will take the image samples and perform a number of basic manipulations on them before training the network. These manipulations include image rotation, horizontal flips, vertical flips, and zooms. The reason behind this is that patterns in aerial imagery are not often found in the same orientation from one image to the next. For example, the letter “P” is only recognizable when it is written in a certain way. When someone flips the letter “P” vertically, it is no longer a “P” and is now a “b”. Patterns that are easily identified by humans for aerial imagery purposes aren’t subject to that constraint. A large industrial complex is still a large industrial complex no matter how you orient it. The training generator helps the network to “learn” this.
train_datagen <- image_data_generator(
rescale = 1/255,
horizontal_flip = TRUE,
vertical_flip = TRUE,
zoom_range = .2,
rotation_range = 15,
fill_mode = "reflect"
)
There is no need to perform image manipulation on the validation images since these are not directly part of the learning process.
validation_datagen <- image_data_generator(rescale = 1/255)
The CNN will process images in groups of 15 in order to learn identification patterns.
batchsize <- 15
Each pixel for every image has 3 values (corresponding to the RGB values the pixel displays), and each image can have hundreds of thousands or millions of pixels in them. Because of the mathematics involved, CNN processing time increases exponentially with the number of input values and having millions of pixel values is simply not feasible to work with from a computational standpoint. As such, it is very important to scale the images down in size (in this case, to 192x192) so that the network can train itself in a reasonable amount of time. Of course, it’s harder for humans to distinguish features in smaller images and the same is true of CNNs when it comes to identifying patterns in images. Again, there’s a balance to be struck between prohibitively long processing time and overall accuracy.
train_generator <- flow_images_from_directory(
train_dir,
train_datagen,
target_size = c(192, 192),
batch_size = batchsize,
class_mode = "categorical"
)
validation_generator <- flow_images_from_directory(
validation_dir,
validation_datagen,
target_size = c(192, 192),
batch_size = batchsize,
class_mode = "categorical"
)
batch <- generator_next(train_generator)
Setting the number of cycles per epoch and the number of steps in each epoch. Prior testing has indicated that accuracy is not significantly improved by increasing the number of epochs much past 40.
history <- model %>% fit_generator(
train_generator,
steps_per_epoch = round(numtrainingfiles/batchsize),
epochs = 40,
validation_data = validation_generator,
validation_steps = round(numtrainingfiles/batchsize)
)
Note the training accuracy of 80% and the validation accuracy of 78%, indicating good model generalization and a lack of undesirable overfitting.
history
## Trained on NULL samples (batch_size=NULL, epochs=50)
## Final epoch (plot to see history):
## val_loss: 0.5685
## val_acc: 0.7874
## loss: 0.4741
## acc: 0.8029
test_dir <- "F:\\LandPro_2010_Imagery\\fulton_2010_jpg\\test"
test_datagen <- image_data_generator(
rescale = 1/255
)
test_generator <- flow_images_from_directory(
test_dir,
test_datagen,
color_mode = "rgb",
target_size = c(192,192),
batch_size = 16,
class_mode = "categorical",
shuffle = FALSE
)
model %>% evaluate_generator(test_generator, steps = 340)
This outputs 5 separate values, each one equal to the probability that an image belongs to each of the 5 separate land use categories as calculated by the model.
preds <- predict_generator(model,
test_generator,
steps = 340)
predictions <- data.frame(test_generator$filenames)
predictions <- cbind(predictions, preds)
names(predictions) <- c("filename", "cat111prob", "cat112prob", "cat113prob", "cat12prob", "cat40prob")
predictions$classprediction <- sapply(1:nrow(predictions), function(x) min(which(predictions[x,2:6] == max(predictions[x, 2:6]))))
predictions$classprediction <- case_when(predictions$classprediction == 1 ~ "111",
predictions$classprediction == 2 ~ "112",
predictions$classprediction == 3 ~ "113",
predictions$classprediction == 4 ~ "12",
predictions$classprediction == 5 ~ "40"
)
predictions$classactual <- case_when(grepl("111", predictions$filename) == TRUE ~ "111",
grepl("112", predictions$filename) == TRUE ~ "112",
grepl("113", predictions$filename) == TRUE ~ "113",
grepl("_12", predictions$filename) == TRUE ~ "12",
grepl("_40", predictions$filename) == TRUE ~ "40"
)
predictions$classactual <- as.factor(predictions$classactual)
predictions$classprediction <- as.factor(predictions$classprediction)
confusionMatrix(predictions$classprediction, predictions$classactual)
## Confusion Matrix and Statistics
##
## Reference
## Prediction 111 112 113 12 40
## 111 679 257 11 2 0
## 112 491 1894 308 39 0
## 113 1 9 71 1 0
## 12 1 2 4 204 5
## 40 10 2 9 11 1422
##
## Overall Statistics
##
## Accuracy : 0.7859
## 95% CI : (0.7748, 0.7968)
## No Information Rate : 0.3983
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.6891
## Mcnemar's Test P-Value : < 2.2e-16
##
## Statistics by Class:
##
## Class: 111 Class: 112 Class: 113 Class: 12 Class: 40
## Sensitivity 0.5745 0.8752 0.17618 0.79377 0.9965
## Specificity 0.9365 0.7437 0.99781 0.99768 0.9920
## Pos Pred Value 0.7155 0.6933 0.86585 0.94444 0.9780
## Neg Pred Value 0.8878 0.9000 0.93796 0.98984 0.9987
## Prevalence 0.2176 0.3983 0.07418 0.04730 0.2627
## Detection Rate 0.1250 0.3486 0.01307 0.03755 0.2617
## Detection Prevalence 0.1747 0.5029 0.01509 0.03976 0.2676
## Balanced Accuracy 0.7555 0.8094 0.58700 0.89573 0.9943
The confusion matrix shows that the model is generally sound when it comes to distinguishing between various levels of residential densities (classes 111, 112, and 113). It is promising to see very few gross errors such as “high density residential” (category 113) being commonly mistaken for “low density residential” (category 111) or vice versa.
Note the table below which shows the tally of incorrect classifications for each category. The network generates far fewer misclassifications for certain categories (such as “forest”, category 40) than others. This is to be expected, as residential densities aren’t binary designations. For residential classifications, the model not only has to determine ‘what’ features are present/not present, but also ‘how many’.
wrongPredictions <- predictions[predictions$classprediction != predictions$classactual,]
table(wrongPredictions$classactual, wrongPredictions$classprediction)
##
## 111 112 113 12 40
## 111 0 491 1 1 10
## 112 257 0 9 2 2
## 113 11 308 0 4 9
## 12 2 39 1 0 11
## 40 0 0 0 5 0
Of course, it’s still very important to look at some of the mistaken images to see if the model’s incorrect predictions still make sense to the human eye.
For reference:
wrongPredictions <- wrongPredictions[sample(1:nrow(wrongPredictions), 25),]
plotImage <- function(imagenum) {
myImage <- load.image(paste(test_dir, wrongPredictions$filename[imagenum], sep = "\\"))
myImage <- resize(myImage, 500, 500)
plot(myImage, axes = FALSE)
legend(0.5, 0, bty = "n", text.font = 2, text.col = "white",
legend = c(paste("Predicted class: ", wrongPredictions$classprediction[imagenum], sep = ""),
paste("Actual class: ", wrongPredictions$classactual[imagenum], sep = ""),
paste("Probability 111: ", round(wrongPredictions$cat111prob[imagenum], 3), sep = ""),
paste("Probability 112: ", round(wrongPredictions$cat112prob[imagenum], 3), sep = ""),
paste("Probability 113: ", round(wrongPredictions$cat113prob[imagenum], 3), sep = ""),
paste("Probability 12: ", round(wrongPredictions$cat12prob[imagenum], 3), sep = ""),
paste("Probability 40: ", round(wrongPredictions$cat40prob[imagenum], 3), sep = ""))
)
}
sapply(1:25, plotImage)
For each aerial image, we have its Northing and Easting coordinates. We will use those coordinates along with the class probabilities from the CNN above to see if we can further improve the classification accuracy.
The rationale behind including this step is twofold:
Due to the way in which land is zoned, it is uncommon to find certain combinations of classifications adjacent to one another. For example, due to regulations it’s unlikely that one plot of land will be zoned for industrial use and the adjacent lot for high-density residential. Adding coordinate data will hopefully pick up on that trend.
The definitions of low-density, medium-density, and high-density residential are not based solely on one image. These images are not classified in a vacuum and should be interpreted to be part of a larger grouping of images of an entire neighborhood. As such, it is entirely possible that an image can look like a low-residential image whereas in actuality it is classified as high-residential since it is simply on the edge of a dense neighborhood. On a single image basis, it is very unlikely that a CNN could ever pick up on that. Adding coordinate data presumably will help it do so.
path <- "F:\\LandPro_2010_Imagery\\"
tfwFiles <- list.files(path, pattern = "tfw", recursive = TRUE)
tfwFiles <- paste(path, tfwFiles, sep = "")
extract <- function(tfwFile) {
t <- readLines(tfwFile)
t <- t[c(5,6)]
name <- gsub(".*[[:digit:]]/", "", tfwFile)
name <- gsub("tfw", "jpg", name)
t <- c(t, name)
code <- str_match(tfwFile, "[[:digit:]]+.tfw")
code <- gsub("\\.tfw", "", code)
code <- gsub(".*_", "", code)
t <- c(t, code)
return(t)
}
NEMatrix <- sapply(1:length(tfwFiles), function(x) extract(tfwFiles[x]))
NEMatrix <- as.data.frame(t(NEMatrix))
names(NEMatrix) <- c("Northing", "Easting", "ImageName", "ImageCat")
options(digits = 18)
NEMatrix$Northing <- as.numeric(as.character(NEMatrix$Northing))
NEMatrix$Easting <- as.numeric(as.character(NEMatrix$Easting))
predictions$filename <- gsub("[[:digit:]]+\\\\", "", predictions$filename)
pNE <- sqldf("SELECT p.filename, p.cat111prob, p.cat112prob, p.cat113prob, p.cat12prob, p.cat40prob, ne.Northing, ne.Easting, p.classprediction, p.classactual
FROM predictions p, NEMatrix ne
WHERE p.filename = ne.ImageName")
pNE <- pNE[,c(2:8,10)]
The RandomForest model takes the probability outputs from the neural network and the NE coordinates and creates a series of decision trees that lead to a final classification for each image. The grid-like nature of zoning regulations and land use makes a decision-tree based algorithm well suited for this data.
intrain <- createDataPartition(pNE$classactual, p = 0.7, list = FALSE)
trainSet <- pNE[intrain,]
testSet <- pNE[-intrain,]
nerfModel <- randomForest(classactual~.,
data = trainSet,
nTree = 700,
mtry = 3,
importance = TRUE)
With coordinate data, the overall accuracy improved by approximately 8%. That might not sound like much at first, but considering that the error rate dropped from 21% to 13%, it’s a very significant improvement. Looking at it a different way, it’s a 38% percent decrease in misclassifications.
nerfPred <- predict(nerfModel, testSet)
confusionMatrix(nerfPred, testSet$classactual)
## Confusion Matrix and Statistics
##
## Reference
## Prediction 111 112 113 12 40
## 111 282 59 1 0 0
## 112 69 572 45 4 0
## 113 0 16 67 2 0
## 12 0 1 6 71 0
## 40 3 1 1 0 428
##
## Overall Statistics
##
## Accuracy : 0.872235872235872
## 95% CI : (0.85503975053596, 0.888076750276268)
## No Information Rate : 0.398648648648649
## P-Value [Acc > NIR] : < 2.22044604925e-16
##
## Kappa : 0.819878551641085
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: 111 Class: 112
## Sensitivity 0.796610169491525 0.881355932203390
## Specificity 0.952904238618524 0.879468845760981
## Pos Pred Value 0.824561403508772 0.828985507246377
## Neg Pred Value 0.944012441679627 0.917910447761194
## Prevalence 0.217444717444717 0.398648648648649
## Detection Rate 0.173218673218673 0.351351351351351
## Detection Prevalence 0.210073710073710 0.423832923832924
## Balanced Accuracy 0.874757204055025 0.880412388982185
## Class: 113 Class: 12
## Sensitivity 0.5583333333333333 0.9220779220779221
## Specificity 0.9880636604774535 0.9954867827208252
## Pos Pred Value 0.7882352941176464 0.9102564102564097
## Neg Pred Value 0.9656513285806870 0.9961290322580645
## Prevalence 0.0737100737100737 0.0472972972972973
## Detection Rate 0.0411547911547912 0.0436117936117936
## Detection Prevalence 0.0522113022113022 0.0479115479115479
## Balanced Accuracy 0.7731984969053934 0.9587823523993737
## Class: 40
## Sensitivity 1.000000000000000
## Specificity 0.995833333333333
## Pos Pred Value 0.988452655889146
## Neg Pred Value 1.000000000000000
## Prevalence 0.262899262899263
## Detection Rate 0.262899262899263
## Detection Prevalence 0.265970515970516
## Balanced Accuracy 0.997916666666667
Achieving maximum accuracy is probably not the end goal for this project. Given the number of arbitrary classification rules and various government regulations that are nearly impossible to quantify or qualify in a model such as this, it’s unlikely that a CNN based model would ever entirely replace an experienced employee in classifying these images. However, it is likely possible to eliminate a large number of images that the model is certain of and passing only those that it is unsure about on to a human eye for final classification. Even if we can only eliminate 50% of images that the model is very confident with, that still corresponds to 1000 man hours saved for the agency.